add fused linear-loss function in Domino #965

duanhx1037 · 2025-04-08T19:36:33Z

Changes

Add a fused and chunked function for linear and cross-entropy loss computation in Domino, based on [Liger-Kernel](https://github.com/linkedin/Liger-Kernel).

Effect on memory usage

Reduce training memory usage, especially peak memory usage in the vocabulary layer. Using a setup of num-layers=4, seq-length=512, batch-size=8 in training/DeepSpeed-Domino/pretrain_gpt3_2.7b.sh, the average memory usage (GB) measured by torch.cuda.max_memory_allocated() in each training iteration will drop from 6.158 to 5.0458.

Effect on loss

Almost identical loss curve in a 1000-iteration experiment.

GuanhuaWang · 2025-04-08T20:38:55Z

Hi @duanhx1037 ,

thx for this pr. Please solve above:

DCO issue
formatting issue with guide here https://github.com/deepspeedai/DeepSpeed/blob/master/CONTRIBUTING.md

Signed-off-by: dhx <[email protected]>

duanhx1037 requested a review from tjruwase as a code owner April 8, 2025 19:36

GuanhuaWang self-requested a review April 8, 2025 20:30

GuanhuaWang self-assigned this Apr 8, 2025

duanhx1037 added 2 commits April 9, 2025 16:37

add fused and chunked linear-loss function

10dcc3f

Signed-off-by: dhx <[email protected]>

update

f4eefa1

Signed-off-by: dhx <[email protected]>

duanhx1037 force-pushed the liger_integration branch from a297a52 to f4eefa1 Compare April 9, 2025 17:29

Trigger formatting check

8a02385

Signed-off-by: dhx <[email protected]>

duanhx1037 force-pushed the liger_integration branch from 06914b0 to 8a02385 Compare April 9, 2025 19:48

GuanhuaWang requested a review from hwchen2017 April 10, 2025 03:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add fused linear-loss function in Domino #965

add fused linear-loss function in Domino #965

duanhx1037 commented Apr 8, 2025

GuanhuaWang commented Apr 8, 2025

add fused linear-loss function in Domino #965

Are you sure you want to change the base?

add fused linear-loss function in Domino #965

Conversation

duanhx1037 commented Apr 8, 2025

GuanhuaWang commented Apr 8, 2025